Empty Categories in a Hindi Treebank

نویسندگان

  • Archna Bhatia
  • Rajesh Bhatt
  • Bhuvana Narasimhan
  • Martha Palmer
  • Owen Rambow
  • Dipti Misra Sharma
  • Michael Tepper
  • Ashwini Vaidya
  • Fei Xia
چکیده

We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make use of ECs. We make a high-level distinction between two types of ECs, trace and silent, on the basis of whether they are postulated to mark displacement or not. Each type is further refined into several subtypes based on the underlying linguistic phenomena which the ECs are introduced to handle. This paper discusses the stages at which we add ECs to the Hindi/Urdu treebank and why. We investigate methodically the different types of ECs and their role in our syntactic and semantic representations. We also examine our decisions whether or not to coindex each type of ECs with other elements in the representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Statistical Approach to Prediction of Empty Categories in Hindi Dependency Treebank

In this paper we use statistical dependency parsing techniques to detect NULL or Empty categories in the Hindi sentences. We have currently worked on Hindi dependency treebank which is released as part of COLINGMTPIL 2012 Workshop. Earlier Rule based approaches are employed to detect Empty heads for Hindi language but statistical learning for automatic prediction is not explored. In this approa...

متن کامل

Empty Categories in Hindi Dependency Treebank: Analysis and Recovery

In this paper, we first analyze and classify the empty categories in a Hindi dependency treebank and then identify various discovery procedures to automatically detect the existence of these categories in a sentence. For this we make use of lexical knowledge along with the parsed output from a constraint based parser. Through this work we show that it is possible to successfully discover certai...

متن کامل

Empty Argument Insertion in the Hindi PropBank

This paper examines both linguistic behavior and practical implications of empty argument insertion in the Hindi PropBank. The Hindi PropBank is annotated on the Hindi Dependency Treebank, which contains some empty categories but rarely the empty arguments of verbs. In this paper, we analyze four kinds of empty arguments, *PRO*, *REL*, *GAP*, *pro*, and suggest effective ways of annotating thes...

متن کامل

Keeping it Simple: Generating Phrase Structure Trees from a Hindi Dependency Treebank

Converting a treebank from one representation type to another poses several challenges [4] [3]. These challenges are contingent on (amongst other things) the information encoded in source representation and the information required in target representation. In this paper, we propose a conversion algorithm that converts the Hindi-Urdu Dependency Treebank (HUTB) to a Phrase Structure (PS) represe...

متن کامل

Using CCG categories to improve Hindi dependency parsing

We show that informative lexical categories from a strongly lexicalised formalism such as Combinatory Categorial Grammar (CCG) can improve dependency parsing of Hindi, a free word order language. We first describe a novel way to obtain a CCG lexicon and treebank from an existing dependency treebank, using a CCG parser. We use the output of a supertagger trained on the CCGbank as a feature for a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010